In [ ]:
    
%%HTML
<style>
.container { width:100% }
</style>
    
In [ ]:
    
import pandas as pd
import numpy  as np
    
In [ ]:
    
IrisDF = pd.read_csv('iris.csv')
IrisDF.head()
    
We extract the set of all species occurring in the DataFrame and convert this list into a set.
In [ ]:
    
Species  = list(set(IrisDF['species']))
Species
    
We extract the feature names.  This can be done conveniently by converting the DataFrame into a list.  However, we do not need the feature 'species' since this is the dependent variable.  Fortunately, this feature is the last element in the list, so we can easily drop it.
In [ ]:
    
Features = list(IrisDF)[:-1]
Features
    
SciKitLearn provides a classifier that is based on the Naive Bayes algorithm that assumes that continuous variables have a Gaussian distribution.
In [ ]:
    
from sklearn.naive_bayes import GaussianNB
    
We extract the independent variables and store them in the design matrix X.
In [ ]:
    
X = IrisDF[Features]
    
We extract the dependent variable and store it in Y.
In [ ]:
    
Y = IrisDF['species']
    
We construct a naive Bayes classifier that assumes a normal distribution and fit the model with our data. This classifier assumes that $$ P(f=x | C) = \frac{1}{\sqrt{2\cdot\pi\;}\cdot \sigma_{f,C}} \cdot \exp\left(-\frac{\bigl(x-\mu_{f,C}\bigr)^2}{2 \cdot \sigma_{f,C}^2}\right). $$ Here $ P(f=x | C)$ is the conditional probability density that the feature $f$ has the value $x$ given that $C$ is the species of the flower under investigation. $\mu_{f,C}$ is the mean value of the feature $f$ for the class $C$, while $\sigma_{f,C}^2$ is the variance of the feature $f$ for the class $C$.
In [ ]:
    
classifier = GaussianNB()
    
We train the classifier with our data.
In [ ]:
    
classifier.fit(X, Y)
    
We compare the predicted values of our classifier with the actual values and compute the accuracy.
In [ ]:
    
np.sum(classifier.predict(X) == Y) / len(Y)
    
In [ ]: